Human activities and more generally the phenomena related to human behaviour take place in a network-constrained subset of the geographical space. These phenomena can be expressed as locations having their positions configured by a road network, as address points with street numbers. Although these events are considered as points on a network, point pattern analysis and the techniques implemented in a GIS environment generally consider events as taking place in a uniform space, with distance expressed as Euclidean and over a homogeneous and isotropic space. Network-spatial analysis has developed as a research agenda where the attention is drawn towards point pattern analytical techniques applied to a space constrained by a road network. Little attention has been put on first order properties of a point pattern (i.e. density) in a network space, while mainly second order analysis such as nearest neighbour and K-functions have been implemented for network configurations of the geographical space. In this article, a method for examining clusters of human-related events on a network, called Network Density Estimation (NDE), is implemented using spatial statistical tools and GIS packages. The method is presented and compared to conventional first order spatial analytical techniques such as Kernel Density Estimation (KDE). Network Density Estimation is tested using the locations of a sample of central, urban activities associated with bank and insurance company branches in the central areas of two midsize European cities, Trieste (Italy) and Swindon (UK).
Many human-related events taking place in geographical space are referred to or constrained by network-led spatial configurations such as the road transport network at an urban or extra-urban level. This article considers density estimation of point patterns in a network rather than in a conventional, Euclidean continuous space.
Point pattern analysis is one of the most used methods in spatial data analysis (Yamada and Rogerson 2003). The interest usually lies in the characteristics of the point pattern relative to some hypothesised process, such as the independent random (Bailey and Gatrell 1995) and the methods are used in many fields of research, including geography, economics, demography, criminology, ecology, epidemiology, and biology. Methods for analysing first-order properties, which describe the way in which the expected value of the spatial point pattern varies across space (intensity), are quadrat analysis and Kernel Density Estimation (KDE), while second-order properties are explored by means of other functions, as for instance nearest neighbour distances and Ripley’s K-function (Ripley 1976, 1981). The K-function is sometimes called the reduced second order measure (Bailey and Gatrell 1995), as it is designed to measure effects at different scales, those implying first order effects and second order trends, as local clustering or a general pattern over the region.
Many phenomena can be considered as point processes and therefore studied by means of point pattern analysis. This is also the case for many human related phenomena that can be georeferenced as point events in space at different levels of precision, such as postcodes or address points, national grid references and latitude and longitude coordinates. Analyses of point distributions generally use algorithms and procedures that calculate Euclidean distances and consider space as continuous, homogeneous and isotropic. In analyses related to social and economic phenomena, however, this is a limitation, as many human-related phenomena are distributed over non-homogeneous spaces such as network-constrained structures. Residents’ locations, shopping centres and bank ATMs are based on street addresses, while ‘events’ such as robberies and car accidents take place on networks or are located close to them. In recent years researchers have attempted to consider networks in a GIS environment, with Batty (2005) in particular seeing this as one of the major issues in analysing spatial phenomena and representation in GIS. Miller (1999) notes that the assumption of a continuous planar space is too strong for analysing events that actually occur in a one-dimensional subset of this space, and Yamada and Thill (2004) recall the greater efficiency of shortest-path versus Euclidean distance measures. Some authors have proposed methods for analysing point patterns over network structures. This is the case of nearest neighbour distances on networks (Okabe et al. 1995), as well as the network K-function (Okabe and Yamada 2001) and its applications to events on networks such as car accidents (Yamada and Thill 2004, 2007). Several adaptations of these network-based methods to market area analysis have been performed. Harvey Miller has extended methods originally implemented on a Euclidean space to networks, as space-time accessibility measurements (Miller 1999) and the network Huff model of spatial interaction (Miller 1994). The latter was also examined by Okabe and Kitamura (1996) with a focus on market area analysis on networks, while Okabe and Okunuki (2001) implemented it in a GIS environment. More recently a series of spatial analytical tools for use in a GIS environment have been implemented to facilitate spatial point pattern analyses on networks (Okabe and Yoshikawa 2003; Okabe et al. 2006a, b).
Little attention, however, seems to be paid to first order properties of point patterns in a network space, particularly in the analysis of the intensity of event features at a local level along a network. Only recently Borruso (2005) and Downs and Horner (2007a,b) have proposed applications of point pattern analysis adapting a network structure to a Kernel Density Estimation. They respectively considered the network density of intersections in the urban road network as an indicator of urban centrality in the first case, and networks of movement trajectories of animals to estimate their home ranges in the second case.
This article presents a procedure for estimating the density of a point pattern of human activities over a network and compares it with a more traditional analysis in a Euclidean space. A method inspired by KDE, and particularly related to the na?ve estimator, called Network Density Estimation (NDE) is proposed and compared with the traditional, Euclidean method. In Section 2 some of the main features of point pattern analysis are briefly reviewed. Section 3 introduces network density estimation. In Sections 4 and 5 an application is carried out which examines the differences between traditional and network methods. The locations of bank and insurance branches in the central areas of two European cities (Trieste, Italy in Section 4 and Swindon, UK in Section 5) are used as a sample of city-centre (CBD) activities. Results from the two procedures are compared. Section 6 suggests future developments of the procedure while Section 7 contains concluding remarks.
When dealing with a point pattern, authors such as Gatrell et al. (1996) define events as the observed locations in a distribution, and points as all the other locations in the study area. Different levels of observations and analysis are proposed. The simple visualization of an event distribution over space by means of dot maps can provide initial information on the structure of the distribution, but more refined analytical instruments are needed for more in depth analysis, and particularly to identify clusters or regularity in the distribution relative to an assumed model, usually that of complete spatial randomness (CSR).
Quadrat analysis is one of the means of ordering the pattern of a distribution of events within a region R . It involves dividing the study region into sub-regions having equal and homogeneous areas as quadrats and then counting the number of events falling in each sub-region (quadrat) in order to simplify the spatial distribution.
The number of events therefore becomes an attribute of the quadrat. It is then possible to represent the spatial distribution by means of homogenous and easily comparable areas, as GIS packages allow visualizing the phenomenon via colour-thematic representation of quadrats. Density analyses are also easy to compute (Gatrell 1994, Bailey and Gatrell 1995, Gatrell et al. 1996). The method has some disadvantages, such as the loss of information from original data and the arbitrariness of the chosen quadrat shape, dimension, orientation and origin. Different analyses could be computed and results obtained, by changing the grid origin or dimensions. One improvement to these limitations involves considering the number of events for each area unit within a mobile ‘window’ of fixed radius centred at a number of points in the region R . An estimate of the intensity in each point of the grid is therefore provided. That generates an estimate of the variation of the intensity smoother than that obtained from a fixed grid of square superimposed cells. This method is the so-called 'naive' method of a group of procedures called Kernel Density Estimation (KDE). The kernel consists of a family of "moving three-dimensional functions that weight events within its sphere of influence according to their distance from the point at which the intensity is being estimated" (Gatrell et al. 1996).
The general form of a kernel estimator is:
where λ(s) is the estimate of the density of the spatial point pattern measured at location s,si the observed ith event, k() represents the kernel weighting function and τ is the bandwidth. The KDE function allows one to estimate the intensity of a point pattern and to represent it by means of a smoothed three-dimensional continuous surface that represents the variation of density of point events across the study region. The procedure can be organized in three steps (Chainey et al. 2002):
The routine therefore calculates the distance between each of the reference cells and the event's locations, evaluates the kernel function for each measured distance and sums the results for each reference cell (Levine 2004).
The function has many advantages if compared with other techniques, as it allows estimation of the density at any location in the study region (O' Sullivan and Unwin 2003) while preserving the total number of events. It allows also a field representation of the phenomenon by means of a smooth, three-dimensional continuous surface in which peaks represent the presence of clusters or 'hot spots' in the distribution of events. The arbitrary variables in the KDE are represented by the bandwidth (Gatrell et al. 1996) and by the grid cell size [1] . Different bandwidths allow analysis of the phenomena at different scales, as a wider bandwidth visualizes a more general trend over the study region and smoothing of the spatial variation of the phenomenon, while a narrower bandwidth highlights more local effects such as 'peaks and valleys' in the distribution. The choice of the bandwidth depends also on the size of the sample points, as sparser events generally are better evaluated using a larger bandwidth, since a narrower one will not provide much more information than the simple observation of event distribution in a dot-map or scatter plot. When the bandwidth is fixed the search radius is constant over the study region, but alternatively an adaptive bandwidth can be used (Silverman 1986, Brunsdon 1995), Most authors emphasize the fact that a bandwidth's choice is more important than choosing the weighting function, as the statistical results are not significantly affected by the various kernel functions (Epanechnikov 1969) [2] .
Different weighting functions can also be used. Levine (2004) summarizes the five different weighting function superimposes a bell-shaped function over any location in the region, extending to infinity in all directions, therefore weighting all the points in the study region with closer points weighted more than distant ones. It is one of the most used functions in KDE, although many authors prefer a quartic function (Bailey and Gatrell 1995). The other four functions can be considered as circumscribed ones, as they strictly search for events within a radius (bandwidth) centred in each grid reference cell.
KDE realised using a normal function appear smoother than those realised using the other functions. Quartic and uniform distributions tend to produce smooth surfaces from the data as well, although lower than that obtained via a normal function. Triangular and negative exponential functions produces more ‘spiky’ areas, emphasizing ‘peaks’ and ‘valleys’ of the data distribution.
The choice of a particular function should therefore follow the user’s aim of highlighting a more general trend rather than finer variations (see also Atkinson and Unwin 1998), as well as the possible need of assigning different weights to near points over far points (Levine 2004).
KDE finds clusters in point pattern distributions over a study area, particularly highlighting ‘circular’ clusters. However, clusters can follow different distribution schemes in network-led spaces, for example, ‘quality’ shops in city centres tend to cluster along high streets while out-of-town ‘big box’ shopping centres are distributed along major roads, therefore forming ‘linear’ clusters.
Network Density Estimation (NDE) involves a modification of the search function from one based on the Euclidean distance to a network-based one in which the bandwidths are calculated as shortest paths departing from each grid cell’s centre following the segments composing the network. Each search area therefore consists on a bandwidthdefined shortest-path tree and its bounding polygon, with shapes that vary with the network’s structure (Figure 1). Furthermore, the density analysis on a network involves the definition of a network space, that consists of the subset of the geographical space close to the network itself, and therefore that part of the geographical space that can be to some extent considered ‘interested’ by the presence of a network (i.e. in an urban environment, the pavements facing a road, the street fronts of buildings, etc.).
In order to test the procedure and compare it with the traditional function based on Euclidean distance, an analysis is conducted using a network search function conceptually similar to the naive estimator and the uniform function used in KDE. In these latter functions, a moving window is placed over the study region, visiting each location for which a density estimate is required and counting the events falling inside. In NDE the moving window is not a circular one as in the naive estimator or in the family of kernel functions of KDE, but of a variable shape, as it is built as a network service area, therefore depending on the road network structure (see Figure 1 for the differences in the two search functions). In NDE a fine resolution grid is superimposed over the study region and grid cells' centroids are computed and used as reference locations for the density estimation. The point process is measured according to its belonging or not to the area defined by the bandwidth on the network, while no weighting functions are considered here when moving out from the reference cell's centre towards the service area's boundary. In this formulation of the NDE, points closer to a reference cell's centre are given the same importance of farther ones, as their contribution to the density function is given only by their belonging to each shortest path tree. The function applied here is not volume preserving, as in the Kernel Density Estimation where the aim is to obtain a smooth estimate of a univariate or multivariate probability density (Bailey and Gatrell 1995), or pycnophylactic (Tobler 1979): the NDE function produces a 'pure' density value expressed in terms of both 'events per (linear) kilometre' and 'events per square kilometre' for each point (reference cell) at which the density is computed. The intensity value at every given location (reference cell) can be therefore obtained by counting the number of events belonging to each shortest path tree and both dividing this point count by the overall length of segments that compose the shortest path tree obtaining a linear density, and the point count by the service area surface bounding the shortest path tree (Fotheringham et al. 2002).
Figure 1 Differences in search area between Euclidean and network. Thin black segments belong to the road network: (a) the dashed circle is the service area computed using a Euclidean distance from the reference cell’s centroid; the thick black segment is the search radius (bandwidth); and (b) in light grey the service area computed using a network distance; dark-grey segments belong to the shortest path tree; dashed line is the bandwidth computed as shortest path on the network
As reported by Silverman (1986) the naive estimator is not fully satisfactory if compared to the more refined Kernel Density Estimator, where a continuous function is placed as a 'hump' on each reference cell, with the naive estimator giving a 'boxed' and ragged visualization of the density estimate. Technically the naive estimator is not continuous, nor does it integrate to unity, that are instead desirable properties maintained in Kernel Density Estimation. However, in the NDE applied here the 'box' effect is quite limited and a certain level of smoothing in the visualization is maintained, as the function varies continuously over the study region, moving from one cell to a contiguous one.
The following section outlines the workflow necessary to perform a network density estimation of a point process over a study area. The actions performed in the different steps can be realized using desktop GIS and standard spatial analytical tools. The first four steps involve the preparation of the dataset, while the density analysis itself is performed from Step 5.
Figure 2 Events and their mirror locations over a network. Light grey segments belong to the entire road network; the circular buffer is in dashed black line; the network service area is in medium grey; the shortest path tree is in dark grey. The light stars represent sample events while the dark stars are their mirror locations on the network. Note the locations of points A and A′. Point A is included in the network service area but it is not reachable through the shortest path tree from point O. Its mirror location A′ is not reached by the shortest path tree and therefore is not selected for the density estimation
These ideas have been tested using a dataset consisting of the locations of bank branches and insurance companies in the central area of the Municipality of Trieste (Northeastern Italy, Figure 3a) in order to highlight clusters of financial services and test for the existence of a 'financial district'. Banks and insurance companies are among the main human activities considered in urban geography's studies for the definition of the Central Business District (Murphy and Vance 1954b). Their concentration is generally high in central areas and usually coupled to land values higher than in less central and more peripheral parts of a city. The land value function as well as the density functions are generally decreasing as the distance from the city centre increases and can present variations and lower intensity peaks located at minor settlements and at the intersection of major arterial roads. Murphy and Vance (1954a) in particular recalled the concentration of central activities to define and delineate the shape and extension of the CBD in urban areas. The use of density estimators could be useful to recall such research, and particularly network-based ones, given the more realistic approximation of the urban structure they allow [10] .
The urban road network was drawn in the study area, together with the point dataset represented by the location of bank and insurance companies. The analysis used a point dataset consisting of 109 events representing bank branches and insurance companies. These data have been collected from Italian Yellow Pages and georeferenced to address points. A grid consisting of 48 rows and 87 columns of 20 m cells was superimposed over the study region, covering an area of 1,670,400 m2 and a road network of 512 arcs for a total length of 34,470 m was also used (Figure 3b).
Figure 3 (a) The municipality of Trieste with road network (light grey), bank and insurance companies (black dots) and the position of the study region (black box); (b) the study region. The grid cells are displayed as reference locations for the density analysis (light grey grid), as well as the urban street network (dark grey lines) and distribution of banks and insurance companies (black dots). Centroids for three selected reference cells are displayed (grey dots) as well as their search functions considering the Euclidean distance (thick dashed grey circles) and network distance on shortest path tree (thick dashed black line polygons); and (c) the study region with shortest path trees for three selected reference cells (thick black lines), distribution of banks and insurance companies (black dots) and their mirror locations on the network (grey dots)
The analysis was performed following the steps described above and therefore shortest path trees and network-service areas from each reference cell were drawn [11] using a 125 m bandwidth [12] . This distance was used after several simulations testing alternatives.
The choice of a 125 m bandwidth followed considerations on the micro, urban scale of the analysis and on the aim of observing the distribution of banks and insurance branches at a very detailed level. It was observed that at this scale of the analysis a bandwidth lower than 125 m produced a too 'spiky' representation of the phenomenon, providing, in extreme cases, not much more information than the simple observation of the point distribution. On the other hand, bandwidth values higher than 250 m caused an excessive dilution of the spatial pattern. It is worth noting that the study region considered here is an area of high concentration of banks and insurance companies as well as of other 'central' human activities in the municipality of Trieste. A traditional 400 m bandwidth quartic Kernel Density Estimation previously performed over the entire area of the municipality highlighted a single peak in the density function of such activities in the central area of the city (Borruso 2006), which is under investigation here [13] . A narrower bandwidth therefore allows a more in depth local analysis of such peak area.
Shortest path trees were used to count the number of banks and insurance companies lying on the network within the 125 m bandwidth distance [14] and the count values were assigned as attributes of the reference cells. Counts were divided by each overall shortest path tree length to obtain relative densities in terms of events per linear kilometre for each reference cell. The 20 m grid cells used represent a discrete approximation of a continuous space and did not conflict with the 125 m bandwidth. Different authors stress the importance of the bandwidth rather than the grid cell size, as the latter is mainly representative of a finer or coarser resolution of the three-dimensional density function obtained. In fact, de Smith et al. (2007) point out that "the grid resolution does not affect the resulting surface form to any great degree", while O' Sullivan and Wong (2007) recall that a grid resolution substantially smaller than the bandwidth by a factor of five or more and minimally by a factor of two affects the density estimate negligibly.
These values were then interpolated in order to obtain a continuous surface to be represented both as a traditional two-dimensional graph and as a three-dimensional map. Inverse Distance Weighting was used to interpolate the density values, using actual distance and relying on spatial resolutions of 20 m for the computed grid. The interpolator was given a power of 1, which provided a better smoothing and shaping of the density distribution. This was done after testing different values, such as 2 and 3, that did not produce relevant differences in the overall shape of the density surface, although the resulting distribution was spikier.
To accomplish this, the reference cells densities and locational coordinates were exported from the GIS environment and processed using a surface interpolating software [15] . Improvements to the density estimation, such as adaptive bandwidths and edge-effect corrections, as suggested by some authors (Silverman 1986, Bracken 1994) were not considered at this stage.
Figures 4 and 5 show three- and two-dimensional visualisations of the network density analysis. Peaks can be noticed mainly in the centre-west and south-west of the study area. These peaks correspond to the city centre where a higher clustering of events, represented by higher values of the density surface, is observed. There are two main clusters that can be identified in the south-west part of the study region where the highest densities are found. Other minor clusters are visible centre-east of the study region, with few cases along main roads. It is also worth noting that a series of small elongated clusters following a north-northeast-south-southwest line in the western part of the study region can also be seen. This corresponds to a seaside major road that bounds the study region on one side of which several banks and insurance companies are located.
Figure 4 NDE on bank and insurance branches, 125 m bandwidth [linear density] (3D)
Figure 5 NDE on bank and insurance branches, 125 m bandwidth [linear density] (2D)
A similar network analysis was performed to produce a density estimation of events per square kilometre of the road network, which consists on dividing the number of events falling within the 125 m bandwidth computed over the network and dividing the value by the overall area of the service area for each reference point. As in the previous case, the single cell values were interpolated in order to obtain a continuous surface. The results are displayed in Figures 6 and 7.
Figure 6 NDE on bank and insurance branches, 125 m bandwidth [area densities] (3D)
Figure 7 NDE on bank and insurance branches, 125 m bandwidth [area densities] (2D)
It is clear that normalising the values by the network length rather than by the service areas in the kernel does not greatly affect the overall shape of the density surfaces, with very similar clusters identified in the two analyses. The derived density values are different in absolute terms and the linear network density appears to be a little smoother than the area normalised one, although the main clusters of events can be easily recognized in the south-west and centre-west parts of the study region. Area-NDE can be used for direct comparison with more traditional KDE while Linear-NDE is more consistent with transport analysis where density values are expressed in events per linear km.
Figure 8 Uniform KDE on bank and insurance branches, 125 m bandwidth (3D)
Figure 9 Uniform KDE on bank and insurance branches, 125 m bandwidth (2D)
These NDE results can be compared to the conventional KDE that inspired this kind of analysis. A Uniform KDE was computed using the same dataset of events using standard spatial statistical software[16]. The same 20 m cell resolution was maintained as was the 125 m bandwidth, which now defines a straight-line Euclidean distance.
For comparison purposes, the simple naive estimator was used instead of more complex kernel functions. Figures 8 and 9 shows the results of KDE in three- and two-dimensional visualisations.
Table 1 Comparison between Network Density Estimator using area and linear densities and uniform Kernel Density Estimator (Trieste, Italy)
| NDE (linear density)events per linear km | NDE (service area density) events per sq km | KDE (uniform) events per sq km | |
| max | 17.12 | 511.13 | 397.33 |
| min | 0.86 | 31.39 | 20.91 |
| mean | 4.26 | 132.21 | 98.78 |
| st. dev | 2.81 | 86.36 | 82.25 |
| not null cells | 1,748 | 1748 | 2779 |
The KDE highlights peaks in the distribution in the same areas as in the Network Density Estimation. Nevertheless, peaks in the south-west part of the study area are merged to form a single elongated cluster, less consistent with the road network orientation and shape, and other areas look denser. The two clusters in the central part of the study region seem to follow an arched shape oriented north-south and a minor cluster appears in the eastern part. If compared to the Network Density Estimation, with this latter estimator the two clusters in the central part of the region seemed oriented differently – the southernmost of the two clusters is oriented along a major road that starts from the centre of the study region and follows a west-southwest-east-northeast orientation together with other two minor clusters along the same road.
It is also worth noting that in KDE analysis there is less marked evidence of a linear cluster along the major seaside road as highlighted with NDE, being here substituted by some light circular clusters.
The density values expressed as number of events per linear kilometre obtained performing NDE are generally higher than those for KDE (Table 1). Similarly mean and standard deviation are higher in NDE while the numbers of null cells obtained from the two analyses are quite different, with NDE presenting a considerably lower number of 'not null' cases. In the NDE analyses the cells considered are only those close to the network itself, in this case within 20 m of the network, and in any case not farther than the 125 m bandwidth, as the density analysis produces results limited to the network subset of the study region's extension. Cells that are outside of these 'network ranges' are therefore assigned a null value as not or poorly accessible from the network itself. The table also reports density values expressed in terms of area density (events per square kilometre). A direct comparison in terms of density value between uniform KDE and linear NDE is not possible apart from the visual impact of the two distributions, although a comparison can be made considering the areal NDE, as the visual results from the two different versions of NDE (areal vs. linear) are not very different from each other. Although NDE is performed both in terms of linear and areal densities, however, given the linear, network oriented approach behind NDE, linear densities seem to be better representing the 'philosophy' of a network-driven analysis. However, it can be useful to consider both kinds of NDE, particularly when comparing different networkconstrained environments, as different cities or parts of a same city are characterized by different network structures. The comparison between linear and areal network densities in two different network environments can in fact help in better understanding the spatial distribution of events and whether there are the conditions for the existence of linear clusters or where 'traditional' circular clusters dominate.
Figure 10 (a) The Borough of Swindon with road network (light grey), bank and insurance companies (black dots) and the position of the study region (black box); (b) the study region. The grid cells are displayed as reference locations for the density analysis (light grey grid), together with the urban street network (dark grey lines) and distribution of banks and insurance companies (black dots) as well as their mirror locations (grey dots)
Differences between the two analyses are, however, not very marked, although NDE seems to be more proficient than KDE, with a given bandwidth, in highlighting clusters at the local level. In the case study presented, the general lack of neat differences between the two analyses could be a consequence of the characteristics and orientation of the street network in the urban area considered, where a Manhattan-like structure dominates and therefore a high regularity in the street network pattern can be identified. However, where few long roads dominate the network structure and events are distributed along it the Network Density Estimator allows the visualization of linear clusters along the network. A first conclusion that could be drawn is that in urban areas where the network structure is particularly compact and events are intensively distributed in space, the differences between KDE and NDE are minimal while a more consistent performance of NDE is evident in areas where major streets or roads – as high streets or main roads connecting the central part of a city to the outer parts – shape the network structure scheme and with the distribution of point events organized around them.
In order to compare the results from the NDE, the procedure was tested on a different urban area using data of similar nature. In particular the area considered was the urban area of the Borough of Swindon (UK), where banks and insurance companies locations were considered within a study region selected in the city centre (Figure 10a). The area was chosen because of its similar size to Trieste, both in terms of population and area, although banks and insurance companies seem less present and with a lower concentration in Swindon than in Trieste. The urban road network has been drawn in the study area with data extracted from an OSCAR dataset (Ordnance Survey ?), together with a point dataset of 32 bank and insurance companies locations. Data have been collected from UK Yellow Pages and georeferenced to unit postcodes. A grid consisting of 70 rows and 67 columns of 20 m cells was superimposed over the study region, covering an area of 1,876,400 m2 and a road network of 374 arcs for a total length of 26,519 m was also used (Figure 10b).
Figure 11 NDE on bank and insurance branches, 125 m bandwidth [linear density] (3D)
The analysis was conducted following the same steps as in the case of Trieste. The same 125 m bandwidth was used on the Swindon dataset in order to rely on a similar scale of analysis for the density estimation. The density estimation highlights the presence of two sub-regions of the study area where clustering of banks and insurance companies branches takes place. These areas are located respectively in the north-western and south-eastern corners of the study area. In the north-western part of the region it is possible to note a main cluster on the west and two other clusters along a main road, the northern one being more elongated along a northwest-southeast road and the southern one located at a crossing between two main roads. In the southern area other clusters can be identified. The peaks in the distribution are less evident if compared to those located in the northern part, although here also a similar pattern can be noticed, particularly with a more evident long cluster towards the south-eastern corner of the study region presenting an elongated shape along a northwest-southeast oriented road. A first analysis was conducted considering density expressed as events per linear km (Figures 11 and 12), with shortest path trees computed for each reference cell used to count the events intersecting them and then dividing the count by the overall shortest path trees lengths.
Figure 12 NDE on bank and insurance branches, 125 m bandwidth [linear density] (2D)
A second analysis produced a density estimate expressed in events per square kilometre (Figures 13 and 14). The results obtained do not differ very much in the two analyses but the 'linear density' analysis seems to be more suitable in highlighting clusters elongated along major streets or roads out from the other clusters. The density analysis carried on in the case of Swindon relied on a less populated dataset of banks and insurance companies. Although the size of the area as well as the overall length of the street network was quite similar to the ones in the case of Trieste, the number of events was nearly one third less than in the Italian case.
A uniform Kernel Density Estimation was performed over the banks and insurance companies of the central area of Swindon in order to compare the results from the Network Density Estimation (Figures 15 and 16). The same parameters used for the NDE on Trieste and Swindon were used both in terms of bandwidth (125 m) and spatial resolution of the interpolation algorithm performed after the density estimation itself (20 m cell size and IDW powered to 1) for the three-dimensional visualization. Main clusters are visible in the same area highlighted by the Network Density Estimation, therefore in the north-western and south-eastern parts of the study region. In this case, however, mainly circular clusters are visible, with two groups of three different-sized clusters in the two sub-regions. These clusters can be compared with those obtained via NDE and it can be noticed that the uniform KDE highlights these circular shapes also where NDE shows elongated clusters oriented along streets and roads.
Figure 13 NDE on bank and insurance branches, 125 m bandwidth [area densities] (3D)
Figure 14 NDE on bank and insurance branches, 125 m bandwidth [area densities] (2D)
Figure 15 Uniform KDE on bank and insurance branches, 125 m bandwidth (3D)
As in the analysis performed on Trieste data, density values obtained from linear and areal NDE were compared to those obtained via a uniform KDE (Table 2). Comparisons between the results can be made in two directions, first considering the differences in density values in Swindon after performing areal NDE and uniform KDE and then comparing the linear NDE in the two cases of Trieste and Swindon.
Areal NDE applied in Swindon highlights very high values in terms of absolute densities (maximum and minimum values) and mean and standard deviation, therefore limiting, in the NDE case, the dilution of density that is higher in KDE. Although the sample is less numerous than in the case of Trieste, when applied to the Swindon' study region NDE seems to be more proficient in highlighting 'hot spots' of clusters particularly located along arcs of streets and roads. This might also be a consequence of the network structure of the study region considered, where there is a certain dominance of major streets and roads leading the spatial pattern of the network and therefore the spatial distribution of the point events.
Figure 16 Uniform KDE on bank and insurance branches, 125 m bandwidth (2D)
Table 2 Comparison between Network Density Estimator using area and linear densities and uniform Kernel Density Estimator (Swindon, UK)
| NDE (linear density) events per linear km | NDE (service area density) events per sq km | KDE (uniform) events per sq km | |
| max | 12.50 | 971.68 | 207.36 |
| min | 1.05 | 32.60 | 20.74 |
| mean | 3.58 | 119.70 | 79.21 |
| st.dev | 1.95 | 98.51 | 54.56 |
| not null cells | 439 | 439 | 1010 |
When comparing the results obtained by applying the linear NDE to Trieste and Swindon it can be seen that the results are quite similar in terms of maximum and minimum density values as well as mean and standard deviation, although both the number of events and not null cells are less numerous in Swindon than in Trieste. Areal NDE, however, presents higher values in Swindon than in Trieste while the opposite is true for uniform KDE. Differences in the network structures of the two study regions play an important role in these results, as well as the different number of events belonging to the two sample datasets. The study region in Trieste is dominated in a huge part by a Manhattan-like network structure, with several banks and insurance companies distributed in the central area of the city in a sort of 'financial district'. The 'compactedness' of the street network and events distribution makes it more difficult to differentiate the results from NDE and KDE, although linear clusters along major roads can be noticed. In the study area of Swindon it is more difficult to highlight a single 'financial district' given by a pure concentration of banks and insurance companies. Both the point events dataset and the network structure are less compact, with banks and insurance companies located quite clearly along a few streets that appear as the 'backbone' of the network structure of the area considered. In such a network structure it is therefore easier to experience different results in terms of density values and linear clusters from NDE and uniform KDE and therefore confirm the suitability of NDE for such analyses.
The NDE is not an alternative to KDE but a network-led integration of this analysis for understanding human phenomena in the urban and extra-urban environment. Further implementations could consider the restrictions on road networks as well as different weighting of arcs belonging to the network in order to consider different cost functions attributed to them or morphological impediments and individual perceptions of the network.
Further research is needed to examine different network configurations and characteristics, in different cities with less well-structured street networks and using more refined search functions, including weighting schemes for the events. Further developments might also include procedures for defining bandwidths for Network Density Estimation. As in other research on networks, the procedures to define the bandwidth should consider the structure of the network where the analysis is carried out. The application of methods based on the intra-events distance, as in the case of nearest neighbour functions, should therefore be based on the network structure. The purpose of the research presented here is trying to link first and second order properties of point patterns in a network-constrained environment. A final note regards the functions to be used for performing the Network Density Estimation, as further developments in the NDE are needed to consider different distance-based weighting functions in a network environment similar to the normal, quartic or triangular ones already implemented in kernel density analysis.
In this article network spatial analysis was considered with particular reference to the first order effects of events distribution over a study region characterized by the presence of a street network. As human activities usually take place in network-organized spaces, there is a need to refine the search functions of traditional methods for analysing regularity or clustering in the distribution of events, using network distances rather than Euclidean ones. The procedure presented is called Network Density Estimation (NDE) and is inspired by Kernel Density Estimation (KDE) as a method for analysing the local spatial distribution of event processes placed on a network in a study region. The main characteristic of the function is that of considering shortest-path trees rather than circular search functions for a density analysis, therefore computing only events that can be reached along the network’s segments.
A density analysis can be used to determine the concentration of ‘central’ human activities in an urban environment, therefore allowing the visualization of denser areas of activities and helping in the analysis of the Central Business District, by means of a three-dimensional surface that gives also the gradient of urban density functions that decrease from the central areas of the city in terms of urban land use and its value (Knos 1962, Haggett 2001) and the population distribution (Yeates and Gardner 1976). Given that the structure of urban areas is to some extent delineated by the orientation of the street and road network pattern, the consideration of a 'network space' consisting on the subset of the geographical space close to the network itself, can be useful in understanding the spatial organization of some human activities. From this vantage point, a network-based density estimator seems to be promising in exploring issues related to the urban forms and functions.
The method can be implemented using standard GIS and spatial statistics functions and has been illustrated using two datasets consisting of banks and insurance companies as samples of CBD activities (Murphy and Vance 1954b) located in the central areas of the cities of Trieste, Italy and Swindon, UK. The analysis was performed using NDE and a more traditional KDE in order to provide comparisons between the two methods. Comparisons were also made between two flavours of Network Density Estimation, these being a linear density estimation and an area one. Although differences between the results from the two analyses obtained using NDE are not very high when compared with conventional KDE, NDE seems to be more proficient in highlighting 'linear' clusters oriented along a street network.
In the case studies KDE and NDE do not differ very much, although NDE is more promising than KDE where a network structure is led by some major roads that guide the development of activities (i.e. shops in high-streets, out-of-town retail activities) and point events tend to be distributed along them. In such cases KDE would create elongated clusters only when events are very close to each other, with sparser events, even along the road, to form circular clusters. NDE can also handle events distributed in a compact network environment and is better suited in highlighting linear clusters, as it draws a clustering pattern more consistent with a network orientation. In particular, NDE and KDE provide similar results when a network structure is quite compact and regular, what happens generally in the central parts of grid-based cities, and point events are quite evenly distributed in the area. Higher differences are visible when a marked influence of a single road or a set of major roads are present and a number of events is distributed along them. The procedure needs to be tested on different datasets of point events that take place on networks, and comparisons made of different urban areas and different parts of the same urban areas to analyse the effects of the different network structures on the estimator.
In this article the method implemented recalled the more general naive estimator, and therefore did not consider a 'distance decay' effect for events farther from the estimation points. Although such a naive-like Network Density Estimator tends to provide less smoothed surfaces and not consider the effect of distance on the density estimate, it provides first elements for comparison with more traditional Euclidean density estimators.
Further developments of the estimator should therefore consider the application of different functions on the network-constrained environment (i.e. quartic or normal kernels) in order to enhance the performance of the estimator in identifying clusters along networks and to weight close and distant events in different ways. That would allow a better visualization of the differences of clustering in the Euclidean and network environments.